ABSTRACT
Spoken term detection (STD) is a fundamental task in spoken information retrieval. Compared to conventional speech transcription and keyword spotting, STD is an open-vocabul-ary task and is necessarily required to address out-of-vocabulary (OOV) terms. Approaches based on subword units, e.g. phonemes, are widely used to solve the OOV issue; however, performance on OOV terms is still significantly inferior to that for in-vocabulary (INV) terms.
The performance degradation on OOV terms can be attributed to a multitude of factors. A particular factor we address in this paper is that the acoustic and language models used for speech transcribing are highly vulnerable to OOV terms, which leads to unreliable confidence measures and error-prone detections.
A direct posterior confidence measure that is derived from discriminative models has been proposed for STD. In this paper, we utilize this technique to tackle the weakness of OOV terms in confidence estimation. Neither acoustic models nor language models being included in the computation, the new confidence avoids the weak modeling problem with OOV terms. Our experiments, set up on multi-party meeting speech which is highly spontaneous and conversational, demonstrate that the proposed technique improves STD performance on OOV terms significantly; when combined with conventional lattice-based confidence, a significant improvement in performance is obtained on both INVs and OOVs. Furthermore, the new confidence measure technique can be combined together with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and term-dependent confidence discrimination, which leads to an integrated solution for OOV STD with greatly improved performance.
- M. Akbacak, D. Vergyri, and A. Stolcke. "Open-vocabulary spoken term detection using graphone-based hybrid recognition systems". In Proc. ICASSP'08, pages 5240--5243, Las Vegas, Nevada, USA, March 2008.Google ScholarCross Ref
- D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar. "Effect of pronunciations on OOV queries in spoken term detection". In Proc. ICASSP'09, pages 3957--3960, Taipei, Taiwan, April 2009. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. "LIBSVM: A library for support vector machines", 2001.Google Scholar
- S. Deligne, F. Yvon, and F. Bimbot. "Variable-length sequence matching for phonetic transcription using joint multigrams". In Proc. Eurospeech'95, pages 2243--2246, Madrid, Spain, September 1995.Google Scholar
- T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. "The AMI meeting transcription system: Progress and performance". In Machine Learning for Multimodal Interaction, volume 4299/2006, pages 419--431. Springer Berlin/Heidelberg, 2006. Google ScholarDigital Library
- H. Hermansky, D. P. Ellis, and S. Sharma. "Tandem connectionist feature extraction for conventional HMM systems". In Proc. ICASSP'00, pages 1635--1638, Istanbul, Turkey, June 2000.Google ScholarCross Ref
- J. Mamou and B. Ramabhadran. "Phonetic query expansion for spoken document retrieval". In Proc. Interspeech'08, pages 2106--2109, Brisbane, Australia, September 2008.Google Scholar
- NIST. "The spoken term detection (STD) 2006 evaluation plan". National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA, 10 edition, September 2006.Google Scholar
- I. Szoke, M. Fapso, L. Burget, and J. Cernock "Hybrid word-subword decoding for spoken term detection". In Proc. Speech search workshop at SIGIR (SSCS'08), Singapore, 2008. Association for Computing Machinery.Google Scholar
- D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W. Wang. "The SRI/OGI 2006 spoken term detection system". In Proc. Interspeech'07, pages 2393--2396, Antwerp, Belgium, August 2007.Google Scholar
- D. Wang, S. King, and J. Frankel. "Stochastic pronunciation modelling for spoken term detection". In Proc. Interspeech'09, pages 2135--2138, Brighton, UK, September 2009.Google Scholar
- D. Wang, S. King, J. Frankel, and P. Bell. "Term-dependent confidence for out-of-vocabulary term detection". In Proc. Interspeech'09, pages 2139--2142, Brighton, UK, September 2009.Google Scholar
- D. Wang, J. Tejedor, J. Frankel, and S. King. "Posterior-based confidence measures for spoken term detection". In Proc. ICASSP'09, pages 4889--4892, Taiwan, April 2009. Google ScholarDigital Library
Index Terms
- Direct posterior confidence for out-of-vocabulary spoken term detection
Recommendations
Direct posterior confidence for out-of-vocabulary spoken term detection
Spoken term detection (STD) is a key technology for spoken information retrieval. As compared to the conventional speech transcription and keyword spotting, STD is an open-vocabulary task and has to address out-of-vocabulary (OOV) terms. Approaches ...
Vocabulary independent spoken term detection
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalWe are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts ...
An approach for efficient open vocabulary spoken term detection
A hybrid two-pass approach for facilitating fast and efficient open vocabulary spoken term detection (STD) is presented in this paper. A large vocabulary continuous speech recognition (LVCSR) system is deployed for producing word lattices from audio ...
Comments